Hypergraph regularized nonnegative triple decomposition for multiway data analysis

Tucker decomposition is widely used for image representation, data reconstruction, and machine learning tasks, but the calculation cost for updating the Tucker core is high. Bilevel form of triple decomposition (TriD) overcomes this issue by decomposing the Tucker core into three low-dimensional third-order factor tensors and plays an important role in the dimension reduction of data representation. TriD, on the other hand, is incapable of precisely encoding similarity relationships for tensor data with a complex manifold structure. To address this shortcoming, we take advantage of hypergraph learning and propose a novel hypergraph regularized nonnegative triple decomposition for multiway data analysis that employs the hypergraph to model the complex relationships among the raw data. Furthermore, we develop a multiplicative update algorithm to solve our optimization problem and theoretically prove its convergence. Finally, we perform extensive numerical tests on six real-world datasets, and the results show that our proposed algorithm outperforms some state-of-the-art methods.

of lower dimension in two directions.TriD performs TD on a tensor and triple decomposes the Tucker core at the same time.The number of parameters in TriD is less than that of TD in substantial cases.Therefore, TriD is less costly than TD.
Although TriD has achieved better results in tensor data recovery experiments, it does not take into account the geometrical manifold structure of the data.In the past decade, manifold learning has been widely adopted to preserve the geometric information of original data.Cai et al. 18 explored the geometrical information by constructing a k-nearest neighbor graph and proposed the graph regularized nonnegative matrix factorization (GNMF), which demonstrated promising performance in clustering analysis.To improve the robustness of GNMF, some variants of GNMF have been proposed as described in the literatiures [19][20][21][22][23][24] .Li et al. 25 introduced a manifold regularization term on the core tensor and proposed a manifold regularization nonnegative Tucker decomposition (MR-NTD) method.Qiu et al. 26 proposed a graph regularized nonnegaitve Tucker decomposition (GNTD) method by applying Laplacian regularization to the last nonnegative factor matrix.Liu et al. 27 presented a technique known as graph regularized L p smooth NTD (GSNTD) via embedding graph regularization and L p smooth constraint into the original model of NTD.Subsequently, Wu et al. 28 proposed a manifold regularization nonnegative triple decomposition (MRNTriD) of tensor sets that takes advantage of tensor geometry information.These graph-based manifold learning methods perform well in clustering.They, however, only consider the pairwise relationship between samples and ignore the high-order relationship among samples.Hypergraph learning is a good candidate for solving this problem.
Using a hypergraph to model the high-order relationship between samples will improve classification performance.There are numerous significant methods combined with hypergraphs that work well in clustering tasks: Zeng et al. 29 presented a hypergraph regularized nonnegative matrix factorization (HNMF) method.Wang et al. 30 introduced a hypergraph regularization to L 1/2 -NMF (HSNMF) for exploiting spectral-spatial joint structure of hypespectral images.Huang et al. 31 constructed a sparse hypergraph for better clustering and proposed a sparse hypergraph regularized NMF (SHNMF) method.Yin et al. 32 proposed a hypergraph regularized nonnegative tensor factorization (HyperNTF) method by incorporating hypergraph into nonnegative tensor decomposition.Zhao et al. 33 introduced a hypergraph regularized term into the framework of the nonnegative tensor ring decomposition and proposed a hypergraph regularized nonnegative tensor ring decomposition (HGNTR).To reduce computational complexity and suppress noise, they applied the low-rank approximation trick to accelerate HGNTR (LraHGNTR) 33 .Huang et al. 34 designed a method to dynamically update the hypergraph and proposed a dynamic hypergraph regularized nonnegative Tucker decomposition (DHNTD) method.
To the best of our knowledge, there is no method to consider higher-order relationships among data sample points in TriD.Inspired by the advantages of hypergraph learning and TriD, in this paper, we present a hypergraph regularized nonnegative triple decomposition (HNTriD) model.HNTriD can explore low-dimensional parts-based representations while preserving detailed complex geometrical information from high-dimensional tensor data.Then, we develop an iterative multiplicative updating algorithm to solve the HNTriD model.The following are the main contributions of this paper: • HNTriD is a novel dimensionality reduction method by incorporating hypergraph learning into TriD.It is good at dealing with the clustering tasks for tensor data, and the computation cost and containment resources could be greatly reduced.• HNTriD embraces the merit of the complex connections of observed samples while retaining raw data struc- tural information in dimensionality reduction.We attribute this excellent performance to the hypergraph regularized term's ability which can successfully approximate the inner relationships of original data.• HNTriD makes sense for some practical applications, such as clustering tasks, because it performs well at multiway data learning and can successfully preserve the important characteristics in dimensionality reduction.Experimental results in some popular datasets, including COIL20, GEORGIA, MNIST, ORL, PIE, and USPS, show that HNTriD outperforms existing rival approaches in cluster analysis.
The remainder of this paper is organized as follows: Section 2 goes over some fundamental concepts, such as NTD, TriD, and hypergraph learning, that will be used in the subsequent sections.The objective function of the HNTriD model is proposed in Section 3, and we discuss the HNTriD optimization algorithm in detail, including the updating rules for the parameters of the model, the convergence analysis of the proposed method, and the computation complexity analysis of HNTriD.In Section 4, we present some experimental results that can be used to validate the efficacy and accuracy of our proposed method.The last section is the conclusion.

Nonnegative tensor decomposition (NTD)
TD is a popular class of methods for dimensionality reduction of high-dimensional data 7 .The data collected in real life are usually nonnegative, so it makes more physical sense to add nonnegative constraints to all factors in TD.Therefore, we focus on the nonnegative tensor decomposition (NTD).In fact, NTD is a multiway extension of nonnegative matrix factorization (NMF) 39 , which imposes nonnegative constraints to the TD model 35 , and it preserves the multilinear structure of data.Given a nonnegative third-order tensor X ∈ R n 1 ×n 2 ×n 3

+
, NTD can be expressed as a core nonnegative tensor X ∈ R r 1 ×r 2 ×r 3 + multiplied by three nonnegative factor matrices , and W ∈ R n 3 ×r 3 + , and it can be formulated as If the smallest integers r 1 , r 2 , r 3 such that (1) holds, then we call the vector (r 1 , r 2 , r 3 ) the Tucker rank.In the process of solving the optimal solution, we usually use its transformation of the mode-n matricization, and (1) can be expressed in the following equivalent forms where X (n) denotes the mode-n matricization of the tensor X , " ⊗ " denotes the Kronecker product of two matrices.

Bilevel form of triple decomposition (TriD)
In the TD and NTD methods, the size of the core tensor grows rapidly as the order of data increases, which may result in a high cost of calculation.To overcome this shortcoming, Qi et al. 17 recently proposed a new form of triple decomposition for third-order tensors, which reduces a third-order tensor to the product of three thirdorder factor tensors.
Definition 1 17 Let X = (x ijl ) ∈ R r 1 ×r 2 ×r 3 be a nonzero tensor.We say that X is the triple product of three third- order square tensors A ∈ R r 1 ×r×r , B ∈ R r×r 2 ×r , and C ∈ R r×r×r 3 , triple product of the tensors is denoted by where A , B , and C are named horizontally square tensor, laterally square tensor, and frontally square tensor, respectively.For i = 1, 2, . . ., r 1 , j = 1, 2, . . ., r 2 , l = 1, 2, . . ., r 3 , the elementwise definition of the triple product can be illustrated as If where " mid{•} " denotes the median, we call (3) is a low rank triple decomposition of X .A, B , and C are the fac- tor tensors of X .The smallest value of r such that (4) holds is known as the triple rank of X , which is denoted as TriRank( X)=r.The triple rank of a zero tensor is defined as zero.
If a third-order tensor is decomposed by TD, and its Tucker core is triple decomposed into three tensors simultaneously.Then we get a bilevel form of the triple decomposition, that is shown below.Definition 2 17 Based on the definition of NTD shown in (1), if the core tensor X has a triple decomposition X = ABC , where TriRank( X)=r, A ∈ R r 1 ×r×r + , B ∈ R r×r 2 ×r + , and C ∈ R r×r×r 3

+
. Then X can be represented as We call (5) a bilevel form of the triple decomposition of X , which is always referred as TriD.A , B , and C are the inner factor tensors.
From (1), the minimum number in parameters of NTD of the third-order tensor X is n 1 r 1 + n 2 r 2 + n 3 r 3 + r 1 r 2 r 3 , where (r 1 , r 2 , r 3 ) is the Tucker rank of X .On the other hand, the number of param- eters of TriD is n 1 r 1 + n 2 r 2 + n 3 r 3 + (r 1 + r 2 + r 3 )r 2 , where r is the triple rank of X .Generally, the triple rank of X is far less than each of the Tucker rank's components of the original tensor X .Then there are substantial cases where the number of parameters of TriD is strictly less than that of the TD. (1)

Hypergraph learning
To improve clustering performance, it is necessary to maintain the internal hidden geometry structure information, which can be detailed by the hypergraph learning.Given n 3 grayscale image {X 1 , X 2 , . . ., X n 3 } , each grayscale image can be viewed as a matrix of size n 1 × n 2 .These n 3 matrices are stacked to form a tensor X of size n 1 × n 2 × n 3 .The i-th frontal slice X (:, :, i) of the tensor X is exactly the matrix X i .In addition, we can build a hypergraph (V, E; S) to encode the geometrical structure of raw data 40 .Each node v i ∈ V represents a related data X i and every hyperedge e i ∈ E consists of several nodes that are clustered by some constraints.For each vertex v i , we form a hyperedge e i of v i and the k-neighbours of v i .For each hyperedge e i with a weight s(e i ) which is used to measure the similarity of the contained image nodes.The weight s(e i ) can be calculated as follows: where σ = 1 kn 3 n 3 i=1 j∈e i �X i − X j � F denotes the mean distance among all vertices in hyperedge e i .In par- ticular, we can construct an incidence matrix H as follows: The degrees of a node v i and a hyperedge e q can be expressed as and respectively.We use D v , D e , and S e to denote diagonal matrices whose elements are d(v i ) , d(e q ) , and s(e q ) , respectively.
If two matrix data X i and X j are similar in the original raw observation, it is reasonable to assume that their low-dimensional representations w i and w j are adjacent to each other.Combined with practical application and theoretical analysis of hypergraph [31][32][33] , we can assume that w i and w j are the corresponding vectors that are related to the nodes v i and v j .Then, the following expression can be used to calculate the clustering similarity of the original data X i and X j in the low-dimensional approximation.

Hypergraph regularized nonnegative triple decomposition (HNTriD)
TriD is a significant tensor data dimensional reduction algorithm, but it ignores higher-order relationships between the inner parts of raw data and does not consider nonnegative constraints, which may result in a big gap in data clustering performance.Modeling the high-order relationship among samples will help to improve performance.Hypergraph learning is an effective tool for illustrating the inner complex connections of multiway data.By incorporating the hypergraph Laplacian regularized term into the bilevel form of triple decomposition, we get a new method named HNTriD, as shown in the following subsection.

Objective function of HNTriD
Suppose X ∈ R n 1 ×n 2 ×n 3 + be a third-order nonnegative tensor which we stack the samples that are represented by n 3 second-order data X i ∈ R n 1 ×n 2 + (i = 1, 2, . . ., n 3 ) as the elements of the third mode, each X i represents an original data sample of the raw data.Note that X (3) = [vec(X 1 ), vec(X 2 ), . . ., vec(X 3 )] ⊤ , unfolding X along the third mode we can simplify (1) into its matricization form that equals to the third equation of (2), which can be written as + can be regarded as a low-dimensional representation for the data X k under the basis of (V ⊗ U) X⊤ (3) .To improve the multiway data representation ability and brush up operational efficiency, we propose the following HNTriD model, which incorporates the hypergraph constraint into the TriD model.For a given nonnegative tensor, X ∈ R n 1 ×n 2 ×n 3 The first and second parts of ( 7) are the reconstruction error term and the hypergraph regularized term, respectively.The reconstruction error term in (7) can be seen as a deep nonnegative tensor decomposition with two layers.The first layer is a TD in the following form where Y × 1 U × 2 V denotes the set of multilinear bases of the original data X and W denotes the encoding matrix of X under this set of multilinear bases.The second layer is the triple decomposition, which takes the following form where each factor tensor represents a different meaning in different application problems.For example, in social networks and transportation data, different characteristics such as temporal stability, spatial correlation, and traffic periodicity may be reflected in each of these three factors.This two-layer decomposition not only reduces the computation required to update the core tensor, but also takes into account the respective advantages of the TD and the triple decomposition.The variable α is an adjustment parameter that is used to measure the importance of the hypergraph regularization term.The hypergraph regularization term preserves the multilateral relationships among the data, so we establish model (7).
HNTriD is used to represent high-dimensional data in a low-dimensional form.To better show the implications of HNTriD, we draw a flowchart to provide a concise overview of the implementation procedure in Figure 2.

Optimization algorithm
When the parameters A, B, C, U, V , and W are considered simultaneously, the objective function f HNTriD of HNTriD in ( 7) is not convex.Therefore, obtaining the global optimal solution is difficult.To deal with it, we introduce an iterative algorithm that achieves a local minimum.To simplify the process of solving the optimal algorithm, we show two important lemmas that will be frequently used.

Solutions of inner factor tensors
When the variables B, C, U, V , and W are fixed, then the objective function of HNTriD is equivalent to The Lagrange function of the above optimization problem ( 10) is The matricization form of ( 11) that along the mode-1 is where F (1) is the unfolding form of F that defined as (8).By Lemma 2, the gradient of L A with respect to A (1) is given by According to 41 , we can take advantage of the Karush-Kuhn-Tucker (KKT) conditions ∂L A ∂A (1) = 0 and � 1 ⊛ A (1) = 0 , then the following equation is satisfied, Based on the above equation, we obtain the following updating rule for A , and Using the same technique, updating rules for inner factor tensors B and C are obtained, which can be expressed as and

Solutions of factor matrices
When the variables A, B, C, U , and V are fixed, then the objective function of HNTriD is equivalent to The Lagrange function of the optimization problem (15) is By using a transformation of the mode-3 matricization of the tensor X and X , ( 16) is obtained as follows By Lemma 2, the gradient of L W with respect to W is given by Using the Karush-Kuhn-Tucker (KKT) conditions ∂L W ∂W = 0 and 3 ⊛ W = 0 , the following equation is satisfied, Based on the above equation, we obtain the following updating rule for W , and Using the same technique, updating rules for the inner factor matrices U and V are obtained, which can pre- sented as and respectively.

Convergence analysis theorically
In this subsection, the convergence of the iterative updating algorithm is investigated.Our proof will make use of an auxiliary function that is defined as below.
Definition 3 42 G(x, x) is an auxiliary function for F(x) if the conditions are satisfied.
The auxiliary function is of great help due to the key property that is shown as follows: Lemma 3 42 If G(x, x) is an auxiliary function of F(x) , then F(x) is non-increasing under the update Vol.:(0123456789)  12) is exactly the same as that shown in (20) with a proper auxiliary function.Considering the ith row and jth column entry [A (1) ] ij in A (1) , we use F ij to denote the part of the objective function (7) that is relevant only to [A (1) ] ij .The first and second derivatives of F ij are and respectively.

Lemma 4 The function is an auxiliary function for
To achieve this, we take into consideration the Taylor series expansion of F ij (x) which can be formalized as Comparing (21) with (22), we can get that G(x, [A (1) ] t ij ) ≥ F ij (x) is satisfied as long as holds, which can be expressed as Since which implies (23) holds, then is satisfied.This completes the proof.
Theorem 1 The objective function of the HNTriD model ( 7) is non-increasing under the updating rule A (1) repre- sented as (12).
Proof Replacing the auxiliary function G(x, x t ) of ( 20) with ( 21) yields (20) Vol:.( 1234567890) Then we can see that (24) agrees with (12), and the Lemma 4 guarantees that ( 21) is an auxiliary function of F ij .
Based on this, in conjunction with Lemma 3, we can get that f HNTriD is non-increasing under the update rule of (12).The proof is then finished.
We are going to state that the update for W expressed as ( 17) is equal to the update (20) with an appropriate auxiliary function.Considering the ith row and jth column entry W ij in W , we use Fij to denote the part of the objective function (7) that is only relevant to W ij .The first and second derivatives of Fij are shown below and respectively.

Lemma 5
The function is an auxiliary function for Fij , which is only relevant to W ij .
Proof Since Ĝ(x, x) = Fij (x) is obvious, we only need to illustrate that the condition Ĝ(x, x) ≥ Fij (x) holds.To achieve this, we take into consideration the Taylor series expansion of Fij (x) which can be expressed as follows Combing (25) with (26) we can find that Ĝ(x, And the above equation can be rewritten as Ĝ(x, Vol is satisfied.This completes the proof.

Theorem 2
The objective function of the HNTriD model ( 7) is non-increasing under the updating rule W repre- sented as (17).
Proof Using (25) to replace the G(x, x t ) that lies in (20), we obtain According to we have It is worth noting that (28) is consistent with (17).Lemma 5 ensures that ( 25) is an auxiliary function of Fij , which combined with Lemma 3 results in f HNTriD being non-increasing under the update rule (17).This brings the proof to a close.
Applying the same techniques to parameters B, C, U , and V to check the convergence of HNTriD.To sum- marize, we can obtain that f HNTriD is non-increasing under each of the update rules for inner factor tensors and matrices A, B, C, U, V , and W while fixing the others.Before imposing our algorithm on real-world datasets for clustering tasks, it is necessary to simplify the calculation formulas of the parameters A, B, C, U, V , and W , as in the following Remark.
Remark 1 From the form of updating rules of A, B, C, U, V , and W , it is a fact that each update needs to calculate the Kronecker products which requires costly storage resources.To simplify the produce of updating for mentioned parameters, we take advantages of the tensor property of the mode-n unfolding.Then, we get and which means that ( 12) and ( 18) can be transformed as and  13) and ( 19) can be calculated as and Similarly, ( 14) and ( 17) can be further rewritten as and respectively.
Hence, the learning rules for the objective function are obtained via the multiplicative update methods described as above.Specifically, we randomly initialize the tensors and factor matrices A , B , C , U, V , and W , then iterate them by ( 29), ( 31), ( 33), ( 30), (32), and (34).Each iteration ends when the stopping criterion is met.After completing all iterations, we record the operations of the model and examine the convergence at the end of each iteration.The pseudo-code for HNTriD is given in Algorithm 1. .The number of nearest neighbors k.The algorithm parameters r 1 , r 2 , r 3 , r, and regularization parameter α.The stopping criterion , and the maximum number of iterations maxiter.Let [f HNT riD ] 0 = 0. Output: Factor inner factor tensors A, B, C, and factor matrices U, V, W; Update the low-rank tensor A as (29)

6:
Update the factor matrix U as (30)   7: Compute matrix G as (8) by updated tensors A and C

8:
Update the low-rank tensor B as (31)   9: Update the factor matrix V as (32)   10: Compute matrix H as (8) by updated tensors A and B 11: Update the low-rank tensor C as (33)   12: Update the factor matrix W as (34)   13: Compute [f HNT riD ] t by updated tensors A, B, C and factor matrices U, V, W.
Return A, B, C, U, V, W.

Computational complexity analysis
In this subsection, we analyze the computational complexity of the proposed HNTriD model.First, we consider the calculation cost for the tensor-tensor product in (8).In the process of computation tensors F , G , and H takes O(r 3 r 2 r 3 ) , O(r 3 r 1 r 3 ) , and O(r 3 r 1 r 2 ) operations, respectively.It requires O(n 1 r 2 1 + n 2 r 2 2 + n 3 r 2 3 ) operations to calculate symmetric matrices U ⊤ U , V ⊤ V , and (1) are available, the computational cost for each term, including , and , is equal to O(n 1 r 2 r 1 ) .Then, the cost of computing the update rules of A in ( 29) is about O(r 2 r 1 ) .Assume that integers r 1 , r 2 , r 3 , and r are of the same order of magnitude and they are much smaller than n 1 , n 2 , and n 3 .We claim that the total computational cost of computing the update rule of A in (29) and U in ( 30) is approximately Similarly, the total computational cost of computing updating rules for B in (31) and V in ( 32) is about The total computational cost of updating the rules for C in (33) and W in ( 34) is approximately Therefore, we can get the total calculation cost of the HNTriD algorithm approximately as

Experiments
To check the validation of our proposed HNTriD algorithm for clustering data with dimensionality reduction, we run experiments on six popular datasets and compare the results of (7) with that of the related state-of-the-art methods, including NMF 42 , GNMF 18 , HNMF 29 , HSNMF 31 , SHNMF 43 , HGNTR 33 , LraHGNTR 33 , HyperNTF 32 , and TriD 17 .All the simulations will be performed on a desktop computer equipped with an Intel (R) Core (TM) i5-10400F CPU at 2.90 GHz and 16 GB of memory, running MATLAB 2015a in Windows 10.

Evaluation metrics
Clustering analysis groups samples only according to the sample data itself and its aim is to group different objects into different groups according to the controlled conditions.The way to evaluate the efficiency of the clustering methods is that objects within groups are similar to each other, while objects differ from group to group.The greater the similarity within the group, the greater the difference between the groups, the better the clustering effect.As we know, the ACC, NMI, and PUR are widely used assessment criteria 44,45 of clustering algorithm.The accuracy (ACC) can be defined as where n is the number of samples in datasets, xi and x i denote the cluster sample and the original sample, respec- tively.The symbol map(•) indicates the matchup relationship mapping function, which is responsible for matching the cluster samples and original samples.The symbol δ(•, •) is the delta function shown as follows In general, the agreement between two clusters can be measured with the mutual information ( MI ), which is widely used in clustering applications.Given two discrete random variables X and X which stand for the cluster www.nature.com/scientificreports/label sets and true label sets, x and x are selected arbitrarily from X and X, respectively.Then, the MI can be measured by where p(x) and p(x) are the edge probability distribution function which denote the probabilities of the samples.The p(x, x) denotes the joint probability distribution function of X and X which means that the object belongs to category X and category X at the same time.To force the score to have an upper bound, we take the NMI as one of the evaluation criterion, and the definition is where T( X) and T(X) are the entropy of the cluster label set X and the entropy of the true label set X.In this way, the score ranges of NMI( X, X) is from 0 to 1.The purity ( PUR ) of a clustering algorithm is a simple assessment format which only have to calculate the proportion of the correct clustering to the total.In other words, the PUR is to scale the degree of correctness of measurement, the PUR score of a cluster is observed by a weighted sum of the PUR values of the respective clusters, which is denoted by where X = (x 1 , x2 , . . ., xk ) is the cluster category set, the xi denotes the ith cluster set.X = (x 1 , x 2 , . . ., x ) is the original datasets that need to be clustered, x i represents the ith original object.The total number of the objects is n that need to be clustered and the function | • | denotes the cardinality of a set.

Algorithms for comparison
To ensure the clustering performance, we compare the proposed HNTriD model with the following state-ofthe-art clustering algorithms.
• NMF 42 : It incorporates nonnegative constraint into two factor matrices decomposed from the original matrix.
• GNMF 18 : It imposes the graph constraint to the coefficient matrix of the NMF method.
• HNMF 29 : It incorporates the hypergraph constraint into the coefficient matrix of the NMF method.
• HSNMF 30 : It imposes the hypergraph constraint on the coefficient matrix based on the L 1/2 -NMF method.
• SHNMF 31 : It takes the sparse hypergraph as a regularization and adds it to the NMF framework.
• HGNTR 33 : It includes the hypergraph constraint on the last TR core tensor and a nonnegative constraint on TR factor tensors. • LraHGNTR 33 : It is the low-rank approximation of HGNTR.
• HyperNTF 32 : It imposes a hypergraph constraint on the last factor matrix of the CP model and limits all factor matrices to be nonnegative.• TriD 17 : It is a bilevel form of the triple decomposition of a third-order tensor.

Parameters selection
To achieve the best performance, some critical parameters in the experimental simulations needed to be adjusted.In all tests, let ǫ = 10 −5 and the maximum number of iterations be 1000 unless otherwise specified.We set the regularized term α at the grid of {10 −3 , 10 −2 , 10 −1 , 1, 10, 100, 1000} , and the k-nearest neighbors are chosen from {3, 4, 5, 6, 7} .The parameters r 1 and r 2 are integers empirically chosen from {3, 4, . . ., 32} , and the integer r is chosen from {2, 3, . . ., 20} .Furthermore, we choose the third mode, r 3 , as the number of categories in the related datasets, as shown in Table 2.In our experiments, we let one of the parameters r 1 , r 2 , r, k, α varies in the grid given above, and the rest of the parameters were fixed, and the parameters corresponding to the maximum  3.
In Figure 3, we show the effect of the parameters α and k on the three indicators ACC, NMI, and PUR on dif- ferent datasets.In subplots (a), (c), and (e) of Figure 3, the remaining parameters except α are taken as in Table 3.In subplots (b), (d), and (f) of Figure 3, the remaining parameters except k are taken as in Table 3.

Convergence study experimentally
In Section 3.3, we demonstrated that our HNTriD algorithm is non-increasing in theory.Here, we validate it using six HNTriD convergence curves tested from six related datasets, which are shown in Figure 4.There are two important key features that can be identified from Figure 4. First, as the number of iterations increases, the objective function value of HNTriD decreases.Second, the convergence report states that the curve declines rapidly and reaches a relatively stable state within thirty iterations.To summarize, HNTriD experiments show that our method works well on the six datasets mentioned above.

Experimental comparison
To validate the effectiveness of the proposed HNTriD method, we compare it to some state-of-the-art methods under the assessment criteria ACC, NMI, and PUR.For HNTriD, the parameters are selected as in Table 3.For each method embodied with manifold learning, we set the regularized term α at the grid of {10 −3 , 10 −2 , 10 −1 , 1, 10, 100, 1000} , and the k-nearest neighbors are chosen from {3, 4, 5, 6, 7}.For methods based on TD, such as TriD and HyperNTF, we take the dimension of the third direction of the core tensor to be the class number of the original data.For methods based on tensor ring decomposition, such as HGNTR and LraHGNTR, we take the product of the first order and the third order to be the class number of the original data.The remaining parameters in the comparison algorithm are adjusted on the grid taken by HNTriD.First, we run a number of numerical tests to compare the clustering effect across different datasets.Second, statistical significance comparison is performed on COIL20 and MNIST using the t-test.Third, we present 2-D visualizations of different methods for clustering results on the COIL20 dataset and then complete the comparison tests by means of the t-SNE technique 46 .Finally, we compare the amount of time they took to finish clustering tasks on six related real-world datasets.

Numerical comparison results
All experiments are run on the same sub-raw datasets, which are chosen at random from the corresponding database.Each experimental result is obtained only after the process has been repeated 100 times.The numerical tests, in particular, are performed in two steps.The first step is to choose a group of objects at random from the raw data and then decompose them into corresponding sub-raw data based on the parametric form of the model.To ensure that the experimental results are as accurate as possible to the real-world data clustering situation.We repeat the first step 10 times to obtain 10 groups of sub-raw data.In the second step, we use the K-means method to compute the evaluation index value for each group of sub-raw data.As before, we repeat the second step 10 times to obtain 10 evaluation values for each group of sub-raw data.Throughout the experiment, we can  7 and 8.According to comparison tests show in Table 7, our method is significantly superior to the compared methods on the COIL20 dataset.The experimental results on MNIST demonstrate a clear decline in performance compared to the experimental findings on COIL20.However, from the overview of all metrics' evaluation, the results still demonstrate a high level of performance when compared to other approaches.Based on the information provided in Tables 7 and 8, it is evident that our method demonstrates significant statistical advancements compared to the listed methods in most cases.The statistical test findings indicate that our method has a significantly bigger advantage over the other compared ones.

Visualization on clustering tasks
In order to visually demonstrate the clustering performance of HNTriD, we present cluster visualizations of several comparable approaches to assess the data learning capability of HNTriD.In this experiment, we choose the COIL20 dataset as a representative example to conduct comparative tests on clustering tasks.We specifically select 10 categories of data for analysis.The data analysis is shown in a two-dimensional space using t-SNE, and the cluster results are displayed in Figure 5 for visual comparison.
Figure 5 demonstrates that the HNTriD method, when applied to the mulitiway dataset, is capable of effectively discerning the differences between data samples.HNTriD outperforms other approaches in visually separating sample clusters in the COIL20 dataset, while some methods fail to completely separate samples from other clusters.This strategy enhances the reliability of the clustering data comparison experiment mentioned above and confirms that the inclusion of HNTriD improves the learning capability of multiway data.

Running time comparison
From the previous experimental results (including numerical experiments, statistical significance comparison, and visualization on clustering tasks), the HNTriD model shows better data analysis performance.However, it is important to take into account the time cost when applying mathematical models in real-life situations.This means that if we can improve the efficiency of calculations while preserving the quality of data analysis, the mathematical model will be more effective in practical applications.Based on this background, we figure out the time cost and use Figure 6 to record the running time of clustering tasks for each method on six related datasets.On each dataset, we compare the computational time required by each method to complete the same numerical tests described in Subsection 4.6.Each bar in the Figure 6 represents the total time needed for a method to complete the cluster analysis of a dataset, and different colors represent different algorithms.For example, for each dataset, the time cost of HNTriD is represented in yellow.
We can deduce the following statements from the bar graph: (i) Matrix-based decomposition methods are almost always faster than tensor-based ones.Matrix-based methods have an obvious advantage in terms of running speed for there are few factors that needed to be updated due to their special arithmetic expression.(ii) When compared to general methods, manifold learning ones take longer to complete clustering tasks in most cases.This occurs because manifold learning algorithms require updating more parameters in clustering data.(iii) When compared to matrix-based algorithms, the HNTriD algorithm takes longer to cluster tasks.Given the computational complexity of the algorithm, the experimental results are consistent with our expectations.The increase in computational time is due to the construction of the hypergraph and the depiction of raw data.(iv) Among the tensor-based methods, the HNTriD algorithm's computation speed does not fall behind while maintaining its superior performance.

Conclusions
In this paper, the proposed HNTriD method performs well in multiway data learning because it combines the advantages of hypergraph learning and TriD.By constructing hypergraphs, it can reveal the complex structural information of more complex variables hidden among raw data.When combined with the TriD model, it can retain the multi-linear structure of high-order data while mining the potential information within the data and has strong data clustering abilities.Furthermore, we use the multiplicative update method to optimize the proposed HNTriD model, and experiments show that the new algorithm is convergent.The proposed algorithm is applied to six real-world datasets for clustering analysis, including COIL20, GEORGIA, MNIST, ORL, PIE, and USPS, and the data clustering results are compared to those of several existing algorithms.The experimental results demonstrate that the proposed HNTriD method is efficient and saves time in data analysis.In our current work, our hypergraph does not change once it is generated, which may result in a less-than-ideal hypergraph learned in some data with unexpected noise.The solution to this problem, however, is outside the scope of our current work, and we hope to improve it in the future.

Figure 1 .
Figure 1.An example of hypergraph and its incident relationship.

+, 3 ++, V ∈ R n 2 ×r 2 +
HNTriD aims to find three nonnegative tensors A ∈ R r 1 ×r×r + , B ∈ R r×r 2 ×r + , and C ∈ R r×r×r and three nonnegative factor matrices U ∈ R n 1 ×r 1 , and W ∈ R n 3 ×r 3 + such that

Figure 2 .
Figure 2. A flowchart used to show the implementation process of HNTriD in data analysis.

4 :
, and H. Calculate the matrix S using S = HS e D −1 e H , and calculate the matrix L = D v − S; 3: for t = 1, 2, • • • , maxiter do Compute matrix F as (8) by updated tensors B and C 5:

Figure 3 .
Figure 3.The clustering performance of the HNTriD model varies with different α and the number of nearest neighbors k.

Figure 4 .
Figure 4. Convergence report of the proposed algorithm on six datasets.

Table 1 .
List of the notations relevant to this paper.TriD are linear dimensionality reduction techniques that may miss the essential nonlinear data structure.Manifold learning, on the other hand, is an effective technique for discovering geometric structure in multiway data, and hypergraph learning is a promising manifold learning method.
Now, we are going to show that the update rule for A (1) shown in ( Scientific Reports | (2024) 14:9098 | https://doi.org/10.1038/s41598-024-59300-3www.nature.com/scientificreports/ r 6 ).DatasetsThe clustering performance is evaluated on six widely used datasets, including COIL20, GEORGIA, MNIST, ORL, PIE, and USPS.The general statistical information of the datasets is summarized in Table2, including the samples, sizes, and categories that were used in the numerical modeling tests of this paper.A brief overview of the mentioned datasets is presented below.•COIL20 (https:// www.cs.colum bia.edu/CAVE/ softw are/ softl ib/ coil-20.php):It is a grayscale image data- set comprised of photographs taken from 20 different individuals, and each person was photographed 72 pieces of images from different angles.After resizing each image to 32 × 32 , we can get a third-order tensor Y ∈ R 32×32×1,440 + .•GEORGIA (http:// www.anefi an.com/ resea rch/ face_ reco.htm):It is a colored JPG image dataset, every image was drawn from 50 people and each person was photographed 15 pieces of images with cluttered backgrounds.The images used in this paper have been converted to grayscale and resized to 32 × 32 .We can obtain a tensor of third order, which defined as Y ∈ R 32×32×750 + .•MNIST (http:// yann.lecun.com/exdb/ mnist/): It is a handwritten digit image dataset, and each image is 28 × 28 in size.More than 60,000 digit images were collected in the MNIST dataset range from "0" to "9".In the numerical tests of this paper, we chose 100 images randomly for each single digit.Thus, the chosen images can be presented as a third-order tensor Y ∈ R 28×28×1,000 + .•ORL (https:// github.com/saeid 436/ Face-Recog nition-MLP/ tree/ main/ ORL): It is a dataset that includes 400 grayscale face images of 40 different people collected from different facial expressions, various facial details, and varying lighting, and each image is in size of 112 × 92 .A third-order tensor can be defined as PIE (http:// www.ri.cmu.edu/proje cts/ proje ct_ 418.html):It is a dataset containing over 40,000 facial images collected from 68 different individuals.These images were taken in a variety of poses, lighting conditions, and expressions.We randomly selected 53 people with 22 different facial images for our numerical tests.We converted them to gray-level and resize them to 32 × 32 .Then the selected images can be expressed as a third-order tensor Y ∈ R 32×32×1,166 Vol:.(1234567890) Scientific Reports | (2024) 14:9098 | https://doi.org/10.1038/s41598-024-59300-3www.nature.com/scientificreports/Y∈ R 112×92×400 + .•+ .•USPS (https:// www.csie.ntu.edu.tw/ cjlin/ libsv mtools/ datas ets/ multi class.html# usps): It is a dataset that includes 11,000 grayscale handwritten digits (from "0" to "9") that are 16 × 16 in size.In the simulation tests of this paper, we chose 100 images at random for each digit.On this basis, we can build a third-order tensor Y ∈ R 16×16×1,000 + .

Table 2 .
Descriptions of the relevant six datasets used in this paper.

Table 3 .
List of parameters' values corresponding to the maximum NMI of HNTriD on six datasets.values of NMI in the experiments were recorded.The optimal parameters corresponding to each dataset are given in Table . The parameter k is set to 4, 3, 5, 5, 6, and 4, the ACC, NMI, and PUR achieve better results on COIL20, GEORGIA, MNIST, ORL, PIE, and USPS datasets, respectively. datasets

Table 4 .
Quantitative clustering (ACC%±std%) of different methods on six datasets.Significant values are bold.